Memory ordering is a group of properties of the modern microprocessors, characterising their possibilities in memory operations reordering. It is a type of out-of-order execution. Memory reordering can be used to fully utilize different cache and memory banks.
On most modern uniprocessors memory operations are not executed in the order specified by the program code. But from the programmer's point of view, all operations appear to have been executed in the order specified, with all inconsistencies hidden by hardware.
Contents |
There are several memory-consistency models for SMP systems:
On some CPUs atomic operations can be reordered with Loads and Stores.
Also, there can be
Type | Alpha | ARMv7 | PA-RISC | POWER | SPARC RMO | SPARC PSO | SPARC TSO | x86 | x86 oostore | AMD64 | IA64 | zSeries |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Loads reordered after Loads | Y | Y | Y | Y | Y | Y | Y | |||||
Loads reordered after Stores | Y | Y | Y | Y | Y | Y | Y | |||||
Stores reordered after Stores | Y | Y | Y | Y | Y | Y | Y | Y | ||||
Stores reordered after Loads | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y |
Atomic reordered with Loads | Y | Y | Y | Y | Y | |||||||
Atomic reordered with Stores | Y | Y | Y | Y | Y | Y | ||||||
Dependent Loads reordered | Y | |||||||||||
Incoherent Instruction cache pipeline | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y |
Some older x86 and AMD systems have weaker memory ordering[3]
SPARC memory ordering modes:
These barriers prevent a compiler from reordering instructions, they do not prevent reordering by CPU.
asm volatile("" ::: "memory");
or even
__asm__ __volatile__ ("" ::: "memory");
forbids GCC compiler to reorder read and write commands around it.[4]
__memory_barrier()
_ReadWriteBarrier()
Many architectures with SMP support have special hardware instruction for flushing reads and writes.
lfence (asm), void_mm_lfence(void) sfence (asm), void_mm_sfence(void) [8] mfence (asm), void_mm_mfence(void) [9]
sync (asm)
dcs (asm)
dmb (asm)
Some compilers support builtins (aka intrinsics) that emit hardware memory barrier instructions.
__sync_synchronize();
MemoryBarrier();
__machine_r_barrier();
__machine_w_barrier();
__machine_rw_barrier();
Creates a barrier across which the compiler will not schedule any data access instruction. The compiler may allocate local data in registers across a memory barrier, but not global data.